AITopics | sensor modality

Collaborating Authors

sensor modality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MAN TruckScenes: A multimodal dataset for autonomous trucking in diverse conditions

Neural Information Processing SystemsFeb-15-2026, 19:43:53 GMT

The scenes are tagged according to 34 distinct scene tags, and all objects are tracked throughout the scene to promote a wide range of applications.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Multi-modal Co-learning for Earth Observation: Enhancing single-modality models via modality collaboration

Mena, Francisco, Ienco, Dino, Dantas, Cassio F., Interdonato, Roberto, Dengel, Andreas

arXiv.org Artificial IntelligenceNov-20-2025

Multi-modal co-learning is emerging as an effective paradigm in machine learning, enabling models to collaboratively learn from different modalities to enhance single-modality predictions. Earth Observation (EO) represents a quintessential domain for multi-modal data analysis, wherein diverse remote sensors collect data to sense our planet. This unprecedented volume of data introduces novel challenges. Specifically, the access to the same sensor modalities at both training and inference stages becomes increasingly complex based on real-world constraints affecting remote sensing platforms. In this context, multi-modal co-learning presents a promising strategy to leverage the vast amount of sensor-derived data available at the training stage to improve single-modality models for inference-time deployment. Most current research efforts focus on designing customized solutions for either particular downstream tasks or specific modalities available at the inference stage. To address this, we propose a novel multi-modal co-learning framework capable of generalizing across various tasks without targeting a specific modality for inference. Our approach combines contrastive and modality discriminative learning together to guide single-modality models to structure the internal model manifold into modality-shared and modality-specific information. We evaluate our framework on four EO benchmarks spanning classification and regression tasks across different sensor modalities, where only one of the modalities available during training is accessible at inference time. Our results demonstrate consistent predictive improvements over state-of-the-art approaches from the recent machine learning and computer vision literature, as well as EO-specific methods. The obtained findings validate our framework in the single-modality inference scenarios across a diverse range of EO applications.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10994-025-06903-0

2510.19579

Country:

Europe > France (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

71ac06f0f8450e7d49063c7bfb3257c2-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 05:54:06 GMT

dataset, sensor, vehicle, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.69)
(3 more...)

Add feedback

Sensor Model Identification via Simultaneous Model Selection and State Variable Determination

Brommer, Christian, Fornasier, Alessandro, Steinbrener, Jan, Weiss, Stephan

arXiv.org Artificial IntelligenceJun-16-2025

We present a method for the unattended gray-box identification of sensor models commonly used by localization algorithms in the field of robotics. The objective is to determine the most likely sensor model for a time series of unknown measurement data, given an extendable catalog of predefined sensor models. Sensor model definitions may require states for rigid-body calibrations and dedicated reference frames to replicate a measurement based on the robot's localization state. A health metric is introduced, which verifies the outcome of the selection process in order to detect false positives and facilitate reliable decision-making. In a second stage, an initial guess for identified calibration states is generated, and the necessity of sensor world reference frames is evaluated. The identified sensor model with its parameter information is then used to parameterize and initialize a state estimation application, thus ensuring a more accurate and robust integration of new sensor elements. This method is helpful for inexperienced users who want to identify the source and type of a measurement, sensor calibrations, or sensor reference frames. It will also be important in the field of modular multi-agent scenarios and modularized robotic platforms that are augmented by sensor modalities during runtime. Overall, this work aims to provide a simplified integration of sensor modalities to downstream applications and circumvent common pitfalls in the usage and development of localization approaches.

artificial intelligence, machine learning, sensor model, (17 more...)

arXiv.org Artificial Intelligence

2506.11263

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

In-Hand Object Pose Estimation via Visual-Tactile Fusion

Nonnengießer, Felix, Kshirsagar, Alap, Belousov, Boris, Peters, Jan

arXiv.org Artificial IntelligenceJun-13-2025

-- Accurate in-hand pose estimation is crucial for robotic object manipulation, but visual occlusion remains a major challenge for vision-based approaches. This paper presents an approach to robotic in-hand object pose estimation, combining visual and tactile information to accurately determine the position and orientation of objects grasped by a robotic hand. We address the challenge of visual occlusion by fusing visual information from a wrist-mounted RGB-D camera with tactile information from vision-based tactile sensors mounted on the fingertips of a robotic gripper . Our approach employs a weighting and sensor fusion module to combine point clouds from heterogeneous sensor types and control each modality's contribution to the pose estimation process. We use an augmented Iterative Closest Point (ICP) algorithm adapted for weighted point clouds to estimate the 6D object pose. Our experiments show that incorporating tactile information significantly improves pose estimation accuracy, particularly when occlusion is high. Our method achieves an average pose estimation error of 7.5 mm and 16.7 degrees, outperforming vision-only baselines by up to 20%. We also demonstrate the ability of our method to perform precise object manipulation in a real-world insertion task. In-hand pose estimation describes the process of determining the position and orientation of an object held within a robotic hand.

artificial intelligence, point cloud, video understanding, (16 more...)

arXiv.org Artificial Intelligence

2506.10787

Country: Europe > Germany > Hesse > Darmstadt Region (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

Robust sensor fusion against on-vehicle sensor staleness

Fan, Meng, Zuo, Yifan, Blaes, Patrick, Montgomery, Harley, Das, Subhasis

arXiv.org Artificial IntelligenceJun-9-2025

Sensor fusion is crucial for a performant and robust Perception system in autonomous vehicles, but sensor staleness--where data from different sensors arrives with varying delays--poses significant challenges. T emporal misalignment between sensor modalities leads to inconsistent object state estimates, severely degrading the quality of trajectory predictions that are critical for safety. W e present a novel and model-agnostic approach to address this problem via (1) a per-point timestamp offset feature (for LiDAR and radar both relative to camera) that enables fine-grained temporal awareness in sensor fusion, and (2) a data augmentation strategy that simulates realistic sensor staleness patterns observed in deployed vehicles. Our method is integrated into a perspective-view detection model that consumes sensor data from multiple LiDARs, radars and cameras. W e demonstrate that while a conventional model shows significant regressions when one sensor modality is stale, our approach reaches consistently good performance across both synchronized and stale conditions.

artificial intelligence, information fusion, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.0578

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)
(3 more...)

Add feedback

Universal Framework to Evaluate Automotive Perception Sensor Impact on Perception Functions

Gamage, A, Donzella, V

arXiv.org Artificial IntelligenceMar-7-2025

Current research on automotive perception systems predominantly focusses on either improving the sensors for data quality or enhancing the performance of perception functions in isolation. Although automotive perception sensors form a fundamental part of the perception system, value addition in sensor data quality in isolation is questionable. However, the end goal for most perception systems is the accuracy of high-level functions such as trajectory prediction of surrounding vehicles. High-level perception functions are increasingly based on deep learning (DL) models due to their improved performance and generalisability compared to traditional algorithms. Innately, DL models develop a performance bias on the comprehensiveness of the training data. Despite the vital need to evaluate the performance of DL-based perception functions under real-world conditions using onboard sensor inputs, there is a lack of frameworks to facilitate systematic evaluations. This paper presents a versatile and cost-effective framework to evaluate the impact of perception sensor modalities and parameter settings on DL-based perception functions. Using a simulation environment, the framework facilitates sensor modality testing and parameter tuning under different environmental conditions. Its effectiveness is demonstrated through a case study involving a state-of-the-art surround trajectory prediction model, highlighting performance differences across sensor modalities and recommending optimal parameter settings. The proposed framework offers valuable insights for designing the perception sensor suite, contributing to the development of robust perception systems for autonomous vehicles.

prediction, sensor, vehicle, (15 more...)

arXiv.org Artificial Intelligence

2503.05939

Country:

North America > United States (0.14)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

AutoMR: A Universal Time Series Motion Recognition Pipeline

Zhang, Likun, Yang, Sicheng, Wang, Zhuo, Liang, Haining, Shen, Junxiao

arXiv.org Artificial IntelligenceFeb-21-2025

In this paper, we present an end-to-end automated motion recognition (AutoMR) pipeline designed for multimodal datasets. The proposed framework seamlessly integrates data preprocessing, model training, hyperparameter tuning, and evaluation, enabling robust performance across diverse scenarios. Our approach addresses two primary challenges: 1) variability in sensor data formats and parameters across datasets, which traditionally requires task-specific machine learning implementations, and 2) the complexity and time consumption of hyperparameter tuning for optimal model performance. Our library features an all-in-one solution incorporating QuartzNet as the core model, automated hyperparameter tuning, and comprehensive metrics tracking. Extensive experiments demonstrate its effectiveness on 10 diverse datasets, achieving state-of-the-art performance. This work lays a solid foundation for deploying motion-capture solutions across varied real-world applications.

dataset, hyperparameter, recognition, (17 more...)

arXiv.org Artificial Intelligence

2502.15228

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > Northern Ireland > County Down > Belfast (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

Jones, Joshua, Mees, Oier, Sferrazza, Carmelo, Stachowicz, Kyle, Abbeel, Pieter, Levine, Sergey

arXiv.org Artificial IntelligenceJan-14-2025

Interacting with the world is a multi-sensory experience: achieving effective general-purpose interaction requires making use of all available modalities -- including vision, touch, and audio -- to fill in gaps from partial observation. For example, when vision is occluded reaching into a bag, a robot should rely on its senses of touch and sound. However, state-of-the-art generalist robot policies are typically trained on large datasets to predict robot actions solely from visual and proprioceptive observations. In this work, we propose FuSe, a novel approach that enables finetuning visuomotor generalist policies on heterogeneous sensor modalities for which large datasets are not readily available by leveraging natural language as a common cross-modal grounding. We combine a multimodal contrastive loss with a sensory-grounded language generation loss to encode high-level semantics. In the context of robot manipulation, we show that FuSe enables performing challenging tasks that require reasoning jointly over modalities such as vision, touch, and sound in a zero-shot setting, such as multimodal prompting, compositional cross-modal prompting, and descriptions of objects it interacts with. We show that the same recipe is applicable to widely different generalist policies, including both diffusion-based generalist policies and large vision-language-action (VLA) models. Extensive experiments in the real world show that FuSeis able to increase success rates by over 20% compared to all considered baselines.

dataset, instruction, modality, (12 more...)

arXiv.org Artificial Intelligence

2501.04693

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots

Wisth, David, Camurri, Marco, Fallon, Maurice

arXiv.org Artificial IntelligenceOct-7-2024

We present visual inertial lidar legged navigation system (VILENS), an odometry system for legged robots based on factor graphs. The key novelty is the tight fusion of four different sensor modalities to achieve reliable operation when the individual sensors would otherwise produce degenerate estimation. To minimize leg odometry drift, we extend the robot's state with a linear velocity bias term, which is estimated online. This bias is observable because of the tight fusion of this preintegrated velocity factor with vision, lidar, and inertial measurement unit (IMU) factors. Extensive experimental validation on different ANYmal quadruped robots is presented, for a total duration of 2 h and 1.8 km traveled. The experiments involved dynamic locomotion over loose rocks, slopes, and mud, which caused challenges such as slippage and terrain deformation. Perceptual challenges included dark and dusty underground caverns, and open and feature-deprived areas. We show an average improvement of 62% translational and 51% rotational errors compared to a state-of-the-art loosely coupled approach. To demonstrate its robustness, VILENS was also integrated with a perceptive controller and a local path planner.

estimation, experiment, robot, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TRO.2022.3193788

2107.07243

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Wiltshire (0.04)
(7 more...)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback